On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization SUPPLEMENTARY MATERIAL

نویسندگان

  • André M. S. Barreto
  • Doina Precup
  • Joelle Pineau
چکیده

This is the supplementary material for the paper entitled “On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization” [2]. It contains the details of our theoretical developments that could not be included in the paper due to space constraints. This material should be read in conjunction with the main paper. 1 Preliminaries • Similarly to Ormoneit and Sen [3], we define a “mother kernel” φ(x) : R+ 7→R+ satisfying (i) φ(x) is continuous in R+, (ii) ∫ ∞ 0 φ(x)dx≤ Lφ < ∞, (iii) φ(x)≥ φ(y) if x < y, (iv) ∃ Aφ ,λφ > 0,∃ Bφ ≥ 0 such that Aφ exp(−x)≤ φ(x)≤ λφ Aφ exp(−x) if x≥ Bφ . Remarks: – Assumption (i) is implied by Ormoneit and Sen’s [3] assumption that φ is Lipschitz continuous. Ormoneit and Sen also assume that ∫ 1 0 φ(z)dz = 1 (see Appendix A.1 in [3]). – Assumption (iv) implies that the kernel function φ will eventually decay exponentially and also that φ(z)> 0 for all z ∈ R+. • Let S⊂ [0,1]d and let ‖ · ‖ be a norm in Rd . Then, we define kτ(s,s) = φ ( ‖ s− s′ ‖ τ ) , where τ > 0 is the “width” of the kernel kτ . • Let M be a Markov decision process (MDP) with state space S and let Sa = {(sk ,ra k , ŝk)|k = 1,2, ...,na} be a set of sample transitions associated with action a ∈ A, where sk , ŝk ∈ S and ra k ∈ R. We define the normalized kernel function associated with action a as κa τ (s,s a i ) = kτ(s,si ) ∑a j=1 kτ(s,s a j) . • Let S̄≡ {s̄1, s̄2, ..., s̄m} be a set of representative states in S. Define: – ŝ∗ ≡ ŝk with k = argmaxi min j ‖ ŝi − s̄ j ‖,

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization

Kernel-based stochastic factorization (KBSF) is an algorithm for solving reinforcement learning tasks with continuous state spaces which builds a Markov decision process (MDP) based on a set of sample transitions. What sets KBSF apart from other kernel-based approaches is the fact that the size of its MDP is independent of the number of transitions, which makes it possible to control the trade-...

متن کامل

Reinforcement Learning using Kernel-Based Stochastic Factorization

Kernel-based reinforcement-learning (KBRL) is a method for learning a decision policy from a set of sample transitions which stands out for its strong theoretical guarantees. However, the size of the approximator grows with the number of transitions, which makes the approach impractical for large problems. In this paper we introduce a novel algorithm to improve the scalability of KBRL. We resor...

متن کامل

Practical Kernel-Based Reinforcement Learning

Kernel-based reinforcement learning (KBRL) stands out among approximate reinforcement learning algorithms for its strong theoretical guarantees. By casting the learning problem as a local kernel approximation, KBRL provides a way of computing a decision policy which is statistically consistent and converges to a unique solution. Unfortunately, the model constructed by KBRL grows with the number...

متن کامل

Tree-Based On-Line Reinforcement Learning

Fitted Q-iteration (FQI) stands out among reinforcement learning algorithms for its flexibility and ease of use. FQI can be combined with any regression method, and this choice determines the algorithm’s statistical and computational properties. The combination of FQI with an ensemble of regression trees gives rise to an algorithm, FQIT, that is computationally efficient, scalable to high dimen...

متن کامل

Inverse Kinematics On-line Learning: a Kernel-Based Policy-Gradient approach

In machine learning, “kernel methods” give a consistent framework for applying the perceptron algorithm to non-linear problems. In reinforcement learning, an analog of the perceptron delta-rule can be derived from the ”policy-gradient” approach proposed by Williams in 1992 in the framework of stochastic neural networks. Despite its generality and straighforward applicability to continuous comma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012